Towards discriminative lexicon optimization

نویسندگان

  • Hauke Schramm
  • Peter Beyerlein
چکیده

A lot of work has been done in deriving the pronunciation dictionary automatically from training data. These attempts focussed mainly on maximum likelihood or similar techniques. Due to the complexity and variability of the pronunciation process it is di cult to nd an adequate pronunciation model. The model will deviate from the truth. Hence, the application of maximum likelihood techniques is likely to be suboptimal. For this reason we present an approach, where the pronunciation model is learned discriminatively from data. The corresponding theory utilizes (1) probabilistic weighting of pronunciation variants of words and (2) discriminative model combination (DMC) based on Viterbi-approximations. We will show that the derived theory adjusts the weighting of pronunciation variants with respect to the word error rate, to the frequency of occurence of the speci c pronunciation in the training data, and to the likelihood of the acoustic observation sequence given the pronunciation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Lexicon Optimization for Automatic Speech Recognition based on Discriminative Learning

In agglutinative languages such as Japanese and Uyghur, selection of lexical unit is not obvious and one of the important issues in designing language model for automatic speech recognition (ASR). In this paper, we propose a discriminative learning method to select word entries which would reduce the word error rate (WER). We define an evaluation function for each word by a set of features and ...

متن کامل

Supervised Bilingual Lexicon Induction with Multiple Monolingual Signals

Prior research into learning translations from source and target language monolingual texts has treated the task as an unsupervised learning problem. Although many techniques take advantage of a seed bilingual lexicon, this work is the first to use that data for supervised learning to combine a diverse set of signals derived from a pair of monolingual corpora into a single discriminative model....

متن کامل

Word Level Discriminative Training for Handwritten Word Recognition

Word level training refers to the process of learning the parameters of a word recognition system based on word level criteria functions. Previously, researchers trained lexicon-driven handwritten word recognition systems at the character level individually. These systems generally use statistical or neural based character recognizers to produce character level confidence scores. In the case of...

متن کامل

Pattern-Responsive Lexicon Optimization

In this paper, we show that current interpretations of Lexicon Optimization (Prince and Smolensky 1993), in particular that of Archiphonemic Underspecification (Inkelas 1995), incorrectly predict the distribution of underspecification in lexical entries. We present cases from three vowel harmony languages in which speakers treat harmonic and disharmonic roots differently under reduplication. Th...

متن کامل

The KIT Translation Systems for IWSLT 2013

In this paper, we present the KIT systems participating in all three official directions, namely English→German, German→English, and English→French, in translation tasks of the IWSLT 2013 machine translation evaluation. Additionally, we present the results for our submissions to the optional directions English→Chinese and English→Arabic. We used phrase-based translation systems to generate the ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001